In [32]:
#@title
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
In [33]:
#@title

from google.colab import drive
drive.mount('/content/gdrive')
Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).

When we look back on the long river of history, we will find philosophy has always been an essential part of promoting the development of human society. At the same time, we have also found that different schools of philosophical ideas have emerged under different era backgrounds. There are impressive philosophy schools include Plato, Aristotle, Empiricism, and Rationalism, etc. As a data scientist, I would like to think more about data about philosophy. I wonder if there will be some exciting discoveries in this philosophy data. Let's focus on the philosophy data

In [34]:
#@title
df=pd.read_csv('gdrive/My Drive/philosophy_data.csv')
df.head()
Out[34]:
title author school sentence_spacy sentence_str original_publication_date corpus_edition_date sentence_length sentence_lowered tokenized_txt lemmatized_str
0 Plato - Complete Works Plato plato What's new, Socrates, to make you leave your ... What's new, Socrates, to make you leave your ... -350 1997 125 what's new, socrates, to make you leave your ... ['what', 'new', 'socrates', 'to', 'make', 'you... what be new , Socrates , to make -PRON- lea...
1 Plato - Complete Works Plato plato Surely you are not prosecuting anyone before t... Surely you are not prosecuting anyone before t... -350 1997 69 surely you are not prosecuting anyone before t... ['surely', 'you', 'are', 'not', 'prosecuting',... surely -PRON- be not prosecute anyone before ...
2 Plato - Complete Works Plato plato The Athenians do not call this a prosecution b... The Athenians do not call this a prosecution b... -350 1997 74 the athenians do not call this a prosecution b... ['the', 'athenians', 'do', 'not', 'call', 'thi... the Athenians do not call this a prosecution ...
3 Plato - Complete Works Plato plato What is this you say? What is this you say? -350 1997 21 what is this you say? ['what', 'is', 'this', 'you', 'say'] what be this -PRON- say ?
4 Plato - Complete Works Plato plato Someone must have indicted you, for you are no... Someone must have indicted you, for you are no... -350 1997 101 someone must have indicted you, for you are no... ['someone', 'must', 'have', 'indicted', 'you',... someone must have indict -PRON- , for -PRON- ...

Based on the data displayed, we can find that this data has a lot of features. At the same time, a guess came to my mind. Is the number of articles published by a philosophy school related to the popularity of this philosophy school? let's focus on the number of titles, author, and school to verify my guess

In [35]:
#@title
features_cat = ['title', 'author', 'school']


for f in features_cat:
    plt.figure(figsize=(18,5))
    df[f].value_counts().plot(kind='bar')
    plt.title(f)
    plt.grid()
    plt.show()

According to the plot display, we can see that the number of philosophical articles published by different authors is quite different. Aristotle's the number of articles is far ahead, followed by Plato's number of published works. Through the analysis of the data, we cannot judge whether the number of articles published in different school is directly related to the popularity of the school. Let us focus our attention on philosophical schools and see if we can find some interesting phenomena from them.

In [36]:
#@title
schools = df.school.unique().tolist()
print(schools)
['plato', 'aristotle', 'empiricism', 'rationalism', 'analytic', 'continental', 'phenomenology', 'german_idealism', 'communism', 'capitalism', 'stoicism', 'nietzsche', 'feminism']

First, we find out which philosophical schools are in the data. According to the analysis results, we found 13 philosophical schools that appeared in the data. So is it possible for philosophical schools to directly have a connection that we did not perceive before?

In [37]:
#@title
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
In [38]:
#@title
stopwords = set(STOPWORDS)
In [39]:
#@title


df_temp = df[df.school=="feminism"]
    
print('School = "feminism"')
    

text = " ".join(txt for txt in df_temp.sentence_lowered)
wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=500,
                      width = 600, height = 400,
                      background_color="white").generate(text)
plt.figure(figsize=(15,10))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
School = "feminism"

From the results of Wordcloud, we found that the high-frequency words appearing in articles of different schools reflect the specificity of these schools to some extent. For example, in Feminism, in their articles, we find that the word women appears more frequently than other words. At the same time, their articles will also have more gender vocabulary. for example, mother,wife,husband

In [40]:
#@title
for sc in schools:
    df_temp = df[df.school==sc]
    
    print('School = ', sc.upper(), ':')
    

    text = " ".join(txt for txt in df_temp.sentence_lowered)
    wordcloud = WordCloud(stopwords=stopwords, max_font_size=50, max_words=500,
                          width = 600, height = 400,
                          background_color="white").generate(text)
    plt.figure(figsize=(18,8))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()
School =  PLATO :
School =  ARISTOTLE :
School =  EMPIRICISM :
School =  RATIONALISM :
School =  ANALYTIC :
School =  CONTINENTAL :
School =  PHENOMENOLOGY :
School =  GERMAN_IDEALISM :
School =  COMMUNISM :
School =  CAPITALISM :
School =  STOICISM :
School =  NIETZSCHE :
School =  FEMINISM :